{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the seventh and final blog post of [Object Detection with YOLO blog series](https://fairyonice.github.io/tag/object-detection-using-yolov2-on-pascal-voc2012-series.html). This blog performs inference using the model in trained in [Part 5 Object Detection with Yolo using VOC 2012 data - training](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html).\n",
    "I will use PASCAL VOC2012 data. \n",
    "This blog assumes that the readers have read the previous blog posts - [Part 1](https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html), [Part 2](https://fairyonice.github.io/Part%202_Object_Detection_with_Yolo_using_VOC_2014_data_input_and_output_encoding.html), [Part 3](https://fairyonice.github.io/Part_3_Object_Detection_with_Yolo_using_VOC_2012_data_model.html), [Part 4](https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html), [Part 5](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html), [Part 6](https://fairyonice.github.io/Part_6_Object_Detection_with_Yolo_using_VOC_2012_data_inference_image.html)\n",
    "\n",
    "## Andrew Ng's YOLO lecture\n",
    "- [Neural Networks - Bounding Box Predictions](https://www.youtube.com/watch?v=gKreZOUi-O0&t=0s&index=7&list=PL_IHmaMAvkVxdDOBRg2CbcJBq9SY7ZUvs)\n",
    "- [C4W3L06 Intersection Over Union](https://www.youtube.com/watch?v=ANIzQ5G-XPE&t=7s)\n",
    "- [C4W3L07 Nonmax Suppression](https://www.youtube.com/watch?v=VAo84c1hQX8&t=192s)\n",
    "- [C4W3L08 Anchor Boxes](https://www.youtube.com/watch?v=RTlwl2bv0Tg&t=28s)\n",
    "- [C4W3L09 YOLO Algorithm](https://www.youtube.com/watch?v=9s_FpMpdYW8&t=34s)\n",
    "\n",
    "\n",
    "## Reference\n",
    "- [You Only Look Once:Unified, Real-Time Object Detection](https://arxiv.org/pdf/1506.02640.pdf) \n",
    "\n",
    "- [YOLO9000:Better, Faster, Stronger](https://arxiv.org/pdf/1612.08242.pdf)\n",
    " \n",
    "- [experiencor/keras-yolo2](https://github.com/experiencor/keras-yolo2)\n",
    "\n",
    "## Reference in my blog\n",
    "- [Part 1 Object Detection using YOLOv2 on Pascal VOC2012 - anchor box clustering](https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html)\n",
    "- [Part 2 Object Detection using YOLOv2 on Pascal VOC2012 - input and output encoding](https://fairyonice.github.io/Part%202_Object_Detection_with_Yolo_using_VOC_2014_data_input_and_output_encoding.html)\n",
    "- [Part 3 Object Detection using YOLOv2 on Pascal VOC2012 - model](https://fairyonice.github.io/Part_3_Object_Detection_with_Yolo_using_VOC_2012_data_model.html)\n",
    "- [Part 4 Object Detection using YOLOv2 on Pascal VOC2012 - loss](https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html)\n",
    "- [Part 5 Object Detection using YOLOv2 on Pascal VOC2012 - training](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html)\n",
    "- [Part 6 Object Detection using YOLOv2 on Pascal VOC 2012 data - inference on image](https://fairyonice.github.io/Part_6_Object_Detection_with_Yolo_using_VOC_2012_data_inference_image.html)\n",
    "- [Part 7 Object Detection using YOLOv2 on Pascal VOC 2012 data - inference on video](https://fairyonice.github.io/Part_7_Object_Detection_with_Yolo_using_VOC_2012_data_inference_video.html)\n",
    "\n",
    "## My GitHub repository \n",
    "This repository contains all the ipython notebooks in this blog series and the funcitons (See backend.py). \n",
    "- [FairyOnIce/ObjectDetectionYolo](https://github.com/FairyOnIce/ObjectDetectionYolo)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3.6.3 |Anaconda, Inc.| (default, Oct  6 2017, 12:04:38) \n",
      "[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]\n"
     ]
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import os, sys\n",
    "print(sys.version)\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read in the hyperparameters to define the YOLOv2 model used during training "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train_image_folder = \"../ObjectDetectionRCNN/VOCdevkit/VOC2012/JPEGImages/\"\n",
    "train_annot_folder = \"../ObjectDetectionRCNN/VOCdevkit/VOC2012/Annotations/\"\n",
    "\n",
    "LABELS = ['aeroplane',  'bicycle', 'bird',  'boat',      'bottle', \n",
    "          'bus',        'car',      'cat',  'chair',     'cow',\n",
    "          'diningtable','dog',    'horse',  'motorbike', 'person',\n",
    "          'pottedplant','sheep',  'sofa',   'train',   'tvmonitor']\n",
    "\n",
    "ANCHORS = np.array([1.07709888,  1.78171903,  # anchor box 1, width , height\n",
    "                    2.71054693,  5.12469308,  # anchor box 2, width,  height\n",
    "                   10.47181473, 10.09646365,  # anchor box 3, width,  height\n",
    "                    5.48531347,  8.11011331]) # anchor box 4, width,  height\n",
    "\n",
    "\n",
    "BOX               = int(len(ANCHORS)/2)\n",
    "TRUE_BOX_BUFFER   = 50\n",
    "IMAGE_H, IMAGE_W  = 416, 416\n",
    "GRID_H,  GRID_W   = 13 , 13"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define model\n",
    "Load the weights trained in [Part 5](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/yumikondo/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  from ._conv import register_converters as _register_converters\n",
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "from backend import define_YOLOv2\n",
    "\n",
    "CLASS             = len(LABELS)\n",
    "model, _          = define_YOLOv2(IMAGE_H,IMAGE_W,GRID_H,GRID_W,TRUE_BOX_BUFFER,BOX,CLASS, \n",
    "                                  trainable=False)\n",
    "model.load_weights(\"weights_yolo_on_voc2012.h5\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read in the mp4 video"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8024 360 480\n"
     ]
    }
   ],
   "source": [
    "import cv2\n",
    "video_inp = \"beyonce.mp4\"\n",
    "video_out = \"beyonce_yolo.mp4\"\n",
    "\n",
    "video_reader = cv2.VideoCapture(video_inp)\n",
    "\n",
    "nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))\n",
    "frame_h   = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))\n",
    "frame_w   = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))\n",
    "print(nb_frames,frame_h,frame_w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 1000/8024\n",
      " 1100/8024\n",
      " 1200/8024\n",
      " 1300/8024\n",
      " 1400/8024\n",
      " 1500/8024\n",
      " 1600/8024\n",
      " 1700/8024\n",
      " 1800/8024\n",
      " 1900/8024\n",
      " 2000/8024\n",
      " 2100/8024\n",
      " 2200/8024\n"
     ]
    }
   ],
   "source": [
    "from backend import ImageReader # from part 2 blog\n",
    "count    = 0\n",
    "min_count = 0#1000\n",
    "max_count = draw2200\n",
    "X_test   = []\n",
    "while count < max_count:\n",
    "    count += 1\n",
    "    ret, _image = video_reader.read()\n",
    "    if (count < min_count):\n",
    "        continue\n",
    "        \n",
    "    if count % 100 == 0:\n",
    "        print(\" {}/{}\".format(count,nb_frames))        \n",
    "    imageReader = ImageReader(IMAGE_H,\n",
    "                              IMAGE_W = IMAGE_W, \n",
    "                              norm    = lambda image : image / 255.)\n",
    "    _image      = imageReader.encode_core(_image)\n",
    "    X_test.append(_image)\n",
    "    \n",
    "X_test = np.array(X_test)\n",
    "\n",
    "video_reader.release()  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# For each video frame, detect objects with YOLO"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_test = np.array(X_test)\n",
    "## model\n",
    "dummy_array    = np.zeros((len(X_test),1,1,1,TRUE_BOX_BUFFER,4))\n",
    "y_pred         = model.predict([X_test,dummy_array])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create video writer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from backend import OutputRescaler, find_high_class_probability_bbox, draw_boxes,nonmax_suppression\n",
    "obj_threshold   = 0.03\n",
    "dir_png         = \"pngfolder\"\n",
    "outputRescaler  = OutputRescaler(ANCHORS=ANCHORS)\n",
    "#video_writer   = cv2.VideoWriter(video_out,\n",
    "#                                 cv2.VideoWriter_fourcc(*'mp4v'), # be sure to use lower case\n",
    "#                                 20.0, \n",
    "#                                 (frame_w, frame_h))\n",
    "\n",
    "for iframe in range(len(y_pred)):\n",
    "        netout       = y_pred[iframe] \n",
    "        image        = X_test[iframe]\n",
    "        # decoding YOLO output\n",
    "        netout_scale = outputRescaler.fit(netout)\n",
    "        boxes        = find_high_class_probability_bbox(netout_scale,obj_threshold)\n",
    "        if len(boxes) > 0:\n",
    "            final_boxes = nonmax_suppression(boxes,\n",
    "                                             iou_threshold = 0.3,\n",
    "                                             obj_threshold = obj_threshold)\n",
    "            if len(final_boxes) > 0: \n",
    "                image = draw_boxes(image,final_boxes,LABELS)\n",
    "        #video_writer.write(np.uint8(image))\n",
    "        plt.figure(figsize=(20,20))\n",
    "        plt.subplots_adjust(hspace=0.02,wspace=0.01, left=0,right=1,bottom=0, top=1) \n",
    "        plt.imshow(image)\n",
    "        plt.savefig(dir_png + \"/fig_{:04.0f}.png\".format(iframe),bbox_inches='tight',pad_inches=0)\n",
    "        plt.close()\n",
    "#video_writer.release()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use ffmpeg to convert pngs to the mp4 video\n",
    "If you do not have ffmpeg, follow this tutorial to install it [ffmpeg installation](https://www.youtube.com/watch?v=8nbuqYw2OCw).\n",
    "\n",
    "Following the [suggestion in stackoverflow](https://superuser.com/questions/820134/why-cant-quicktime-play-a-movie-file-encoded-by-ffmpeg)\n",
    "From the terminal run:\n",
    "    \n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "     ffmpeg -pattern_type glob -i \"fig_*.png\" -vcodec libx264 -s 640x480 -pix_fmt yuv420p movie.mp4   \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[FairyOnIce/ObjectDetectionYolo](https://github.com/FairyOnIce/ObjectDetectionYolo)\n",
    " contains this ipython notebook and all the functions that I defined in this notebook. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
